This content is the Seurat PBMC tutorial with an additional section to add in Monocle for pseudotime trajectory analysis.
Please go to the installation page for instructions on how to install the libraries used for this workshop. There are also instructions for downloading the raw data there as well.
For this tutorial, we will be analyzing the a dataset of Peripheral Blood Mononuclear Cells (PBMC) freely available from 10X Genomics. There are 2,700 single cells that were sequenced on the Illumina NextSeq 500. The raw data can be found here.
We start by reading in the data. The Read10X() function
reads in the output of the cellranger
pipeline from 10X, returning a unique molecular identified (UMI) count
matrix. The values in this matrix represent the number of molecules for
each feature (i.e. gene; row) that are detected in each cell
(column).
We next use the count matrix to create a Seurat object.
The object serves as a container that contains both data (like the count
matrix) and analysis (like PCA, or clustering results) for a single-cell
dataset. For a technical discussion of the Seurat object
structure, check out the GitHub Wiki. For
example, the count matrix is stored in
pbmc[["RNA"]]@counts.
library(dplyr)
library(ggplot2)
library(Seurat)
library(patchwork)
# Load the PBMC dataset
pbmc.data <- Read10X(data.dir = "filtered_gene_bc_matrices/hg19/")
# Initialize the Seurat object with the raw (non-normalized data).
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc
## An object of class Seurat
## 13714 features across 2700 samples within 1 assay
## Active assay: RNA (13714 features, 0 variable features)
# Lets examine a few genes in the first thirty cells
pbmc.data[c("CD3D", "TCL1A", "MS4A1"), 1:30]
## 3 x 30 sparse Matrix of class "dgCMatrix"
##
## CD3D 4 . 10 . . 1 2 3 1 . . 2 7 1 . . 1 3 . 2 3 . . . . . 3 4 1 5
## TCL1A . . . . . . . . 1 . . . . . . . . . . . . 1 . . . . . . . .
## MS4A1 . 6 . . . . . . 1 1 1 . . . . . . . . . 36 1 2 . . 2 . . . .
The . values in the matrix represent 0s (no molecules
detected). Since most values in an scRNA-seq matrix are 0, Seurat uses a
sparse-matrix representation whenever possible. This results in
significant memory and speed savings for Drop-seq/inDrop/10x data.
dense.size <- object.size(as.matrix(pbmc.data))
dense.size
## 709591472 bytes
sparse.size <- object.size(pbmc.data)
sparse.size
## 29905192 bytes
dense.size/sparse.size
## 23.7 bytes
Lets take a look at the seurat object we have just created in R,
pbmc
To accomodate the complexity of data arising from a single cell RNA seq experiment, the seurat object keeps this as a container of multiple data tables that are linked.
The functions in seurat can access parts of the data object for analysis and visualisation, we will cover this later on.
There are a couple of concepts to discuss here.These are essentially data containers in R as a class, and can accessed as a variable in the R environment.
Classes are pre-defined and can contain multiple data tables and metadata. For Seurat, there are three types.
Many of the functions in Seurat operate on the data class and slots
within them seamlessly, there maybe occasion to acess these separately
to hack them however, this is an advanced analysis
method.
Lets take a look at some things within the seurat object. We can do
this with the str function in R.
What is in the meta.data slot within your seurat object
currently? What type of data is contained here?
Where is our count data within the seurat object?
## To look at our seurat object
str(pbmc)
## Formal class 'Seurat' [package "SeuratObject"] with 13 slots
## ..@ assays :List of 1
## .. ..$ RNA:Formal class 'Assay' [package "SeuratObject"] with 8 slots
## .. .. .. ..@ counts :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
## .. .. .. .. .. ..@ i : int [1:2282976] 29 73 80 148 163 184 186 227 229 230 ...
## .. .. .. .. .. ..@ p : int [1:2701] 0 779 2131 3260 4220 4741 5522 6304 7094 7626 ...
## .. .. .. .. .. ..@ Dim : int [1:2] 13714 2700
## .. .. .. .. .. ..@ Dimnames:List of 2
## .. .. .. .. .. .. ..$ : chr [1:13714] "AL627309.1" "AP006222.2" "RP11-206L10.2" "RP11-206L10.9" ...
## .. .. .. .. .. .. ..$ : chr [1:2700] "AAACATACAACCAC-1" "AAACATTGAGCTAC-1" "AAACATTGATCAGC-1" "AAACCGTGCTTCCG-1" ...
## .. .. .. .. .. ..@ x : num [1:2282976] 1 1 2 1 1 1 1 41 1 1 ...
## .. .. .. .. .. ..@ factors : list()
## .. .. .. ..@ data :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
## .. .. .. .. .. ..@ i : int [1:2282976] 29 73 80 148 163 184 186 227 229 230 ...
## .. .. .. .. .. ..@ p : int [1:2701] 0 779 2131 3260 4220 4741 5522 6304 7094 7626 ...
## .. .. .. .. .. ..@ Dim : int [1:2] 13714 2700
## .. .. .. .. .. ..@ Dimnames:List of 2
## .. .. .. .. .. .. ..$ : chr [1:13714] "AL627309.1" "AP006222.2" "RP11-206L10.2" "RP11-206L10.9" ...
## .. .. .. .. .. .. ..$ : chr [1:2700] "AAACATACAACCAC-1" "AAACATTGAGCTAC-1" "AAACATTGATCAGC-1" "AAACCGTGCTTCCG-1" ...
## .. .. .. .. .. ..@ x : num [1:2282976] 1 1 2 1 1 1 1 41 1 1 ...
## .. .. .. .. .. ..@ factors : list()
## .. .. .. ..@ scale.data : num[0 , 0 ]
## .. .. .. ..@ key : chr "rna_"
## .. .. .. ..@ assay.orig : NULL
## .. .. .. ..@ var.features : logi(0)
## .. .. .. ..@ meta.features:'data.frame': 13714 obs. of 0 variables
## .. .. .. ..@ misc : list()
## ..@ meta.data :'data.frame': 2700 obs. of 3 variables:
## .. ..$ orig.ident : Factor w/ 1 level "pbmc3k": 1 1 1 1 1 1 1 1 1 1 ...
## .. ..$ nCount_RNA : num [1:2700] 2419 4903 3147 2639 980 ...
## .. ..$ nFeature_RNA: int [1:2700] 779 1352 1129 960 521 781 782 790 532 550 ...
## ..@ active.assay: chr "RNA"
## ..@ active.ident: Factor w/ 1 level "pbmc3k": 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "names")= chr [1:2700] "AAACATACAACCAC-1" "AAACATTGAGCTAC-1" "AAACATTGATCAGC-1" "AAACCGTGCTTCCG-1" ...
## ..@ graphs : list()
## ..@ neighbors : list()
## ..@ reductions : list()
## ..@ images : list()
## ..@ project.name: chr "pbmc3k"
## ..@ misc : list()
## ..@ version :Classes 'package_version', 'numeric_version' hidden list of 1
## .. ..$ : int [1:3] 4 0 4
## ..@ commands : list()
## ..@ tools : list()
## To access the meta.data slot
head(pbmc@meta.data, n = 5)
| orig.ident | nCount_RNA | nFeature_RNA | |
|---|---|---|---|
| AAACATACAACCAC-1 | pbmc3k | 2419 | 779 |
| AAACATTGAGCTAC-1 | pbmc3k | 4903 | 1352 |
| AAACATTGATCAGC-1 | pbmc3k | 3147 | 1129 |
| AAACCGTGCTTCCG-1 | pbmc3k | 2639 | 960 |
| AAACCGTGTATGCG-1 | pbmc3k | 980 | 521 |
## meta.data contains cell metadata identified by cell barcode, currently there is nFeatures
## and nCounts
## the actual count data can be found by which is what we had in `pbmc.data` lots of accessors
## here!
head(pbmc@assays$RNA@counts, n = 5)
## 5 x 2700 sparse Matrix of class "dgCMatrix"
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . 1 . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . 1 . . 1 . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . 1 . . . . . . . . . . . . . 1 . . . . . . . . . .
##
## AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AP006222.2 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## LINC00115 . . . . . . . . . . . . . . 1 . . . . . . . . . . . .
## this is the data object in pbmc.data but is now stored within the seurat object
pbmc@assays$RNA@counts[c("CD3D", "TCL1A", "MS4A1"), 1:30]
## 3 x 30 sparse Matrix of class "dgCMatrix"
##
## CD3D 4 . 10 . . 1 2 3 1 . . 2 7 1 . . 1 3 . 2 3 . . . . . 3 4 1 5
## TCL1A . . . . . . . . 1 . . . . . . . . . . . . 1 . . . . . . . .
## MS4A1 . 6 . . . . . . 1 1 1 . . . . . . . . . 36 1 2 . . 2 . . . .
The steps below encompass the standard pre-processing workflow for scRNA-seq data in Seurat. These represent the selection and filtration of cells based on QC metrics, data normalization and scaling, and the detection of highly variable features.
Seurat allows you to easily explore QC metrics and filter cells based on any user-defined criteria. A few QC metrics commonly used by the community include
PercentageFeatureSet() function, which calculates the
percentage of counts originating from a set of featuresMT- as a set
of mitochondrial genes# The [[ operator can add columns to object metadata. This is a great place to stash QC stats
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
CreateSeuratObject()
# Show QC metrics for the first 5 cells
head(pbmc@meta.data, 5)
| orig.ident | nCount_RNA | nFeature_RNA | percent.mt | |
|---|---|---|---|---|
| AAACATACAACCAC-1 | pbmc3k | 2419 | 779 | 3.0177759 |
| AAACATTGAGCTAC-1 | pbmc3k | 4903 | 1352 | 3.7935958 |
| AAACATTGATCAGC-1 | pbmc3k | 3147 | 1129 | 0.8897363 |
| AAACCGTGCTTCCG-1 | pbmc3k | 2639 | 960 | 1.7430845 |
| AAACCGTGTATGCG-1 | pbmc3k | 980 | 521 | 1.2244898 |
What do you noticed has changed within the meta.data
table now that we have calculated mitochondrial gene proportion?
Could we add more data into the meta.data table?
In the example below, we visualize QC metrics, and use these to filter cells.
# Visualize QC metrics as a violin plot
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
# FeatureScatter is typically used to visualize feature-feature relationships, but can be used
# for anything calculated by the object, i.e. columns in object metadata, PC scores etc.
plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot1 + plot2
Lets look at the number of features (genes) to the percent mitochondrial genes plot.
plot3 <- FeatureScatter(pbmc, feature1 = "nFeature_RNA", feature2 = "percent.mt")
plot3
Ribosomal gene expression could be another factor to look into your cells within your experiment.
Create more columns of metadata using
PercentageFeatureSet function, this time search for
ribosomal genes. We can calculate the percentage for the large subunit
(RPL) and small subunit (RPL) ribosomal genes.
Use FeatureScatter to plot combinations of metrics
available in metadata. How is the mitochondrial gene percentage related
to the ribosomal gene percentage? What can you see? Discuss in break
out?
Create new meta.data columns to contain percentages of the large and small ribosomal genes.
Then plot a scatter plot with this new data.
pbmc[["percent.riboL"]] <- PercentageFeatureSet(pbmc, pattern = "^RPL")
pbmc[["percent.riboS"]] <- PercentageFeatureSet(pbmc, pattern = "^RPS")
plot1 <- FeatureScatter(pbmc, feature1 = "percent.riboS", feature2 = "percent.riboL")
plot1
The large and small ribosomal subunit genes are correlated within
cell.
What about with mitochondria and gene, feature counts?
plot2 <- FeatureScatter(pbmc, feature1 = "percent.riboL", feature2 = "percent.mt")
plot2
There are cells with low ribosome and low mitochondrial gene percentages, and some outliers too (low ribo, high mt)
These are the cells you may want to exclude.Highlight cells with very low percentage of ribosomal genes, create a
new column in the meta.data table and with FeatureScatter
make a plot of the RNA count and mitochondrial percentage with the cells
with very low ribosomal gene perentage.
pbmc[["lowRiboL"]] <- pbmc[["percent.riboL"]] <= 5
plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt", group.by = "lowRiboL")
plot1
Okay we are happy with our thresholds for mitochondrial percentage in cells, lets apply them and subset our data. This will remove the cells we think are of poor quality.
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
Lets replot the feature scatters and see what they look like.
plot5 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot6 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot5 + plot6
After removing unwanted cells from the dataset, the next step is to
normalize the data. By default, we employ a global-scaling normalization
method “LogNormalize” that normalizes the feature expression
measurements for each cell by the total expression, multiplies this by a
scale factor (10,000 by default), and log-transforms the result.
Normalized values are stored in pbmc[["RNA"]]@data.
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
For clarity, in this previous line of code (and in future commands), we provide the default values for certain parameters in the function call. However, this isn’t required and the same behavior can be achieved with:
pbmc <- NormalizeData(pbmc)
We next calculate a subset of features that exhibit high cell-to-cell variation in the dataset (i.e, they are highly expressed in some cells, and lowly expressed in others). We and others have found that focusing on these genes in downstream analysis helps to highlight biological signal in single-cell datasets.
Our procedure in Seurat is described in detail here, and improves
on previous versions by directly modeling the mean-variance relationship
inherent in single-cell data, and is implemented in the
FindVariableFeatures() function. By default, we return
2,000 features per dataset. These will be used in downstream analysis,
like PCA.
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(pbmc), 10)
# plot variable features with and without labels
plot1 <- VariableFeaturePlot(pbmc)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot1 + plot2
What if we wanted to look at genes we are specifically interested in? We can create a character vector of gene names and apply that to this plot.
Lets look at some genes that could be of interest such as IL8, IDH2 and CXCL3
# create a vector of genes of interest
goi <- c("IL8", "IDH2", "CXCL3")
# plot variable features with and without labels
plot3 <- LabelPoints(plot = plot1, points = goi, repel = TRUE)
plot2 + plot3
Next, we apply a linear transformation (‘scaling’) that is a standard
pre-processing step prior to dimensional reduction techniques like PCA.
The ScaleData() function:
pbmc[["RNA"]]@scale.dataall.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
Scaling is an essential step in the Seurat workflow, but only on
genes that will be used as input to PCA. Therefore, the default in
ScaleData() is only to perform scaling on the previously
identified variable features (2,000 by default). To do this, omit the
features argument in the previous function call, i.e.
# pbmc <- ScaleData(pbmc)
Your PCA and clustering results will be unaffected. However, Seurat
heatmaps (produced as shown below with DoHeatmap()) require
genes in the heatmap to be scaled, to make sure highly-expressed genes
don’t dominate the heatmap. To make sure we don’t leave any genes out of
the heatmap later, we are scaling all genes in this tutorial.
In Seurat v2 we also use the ScaleData()
function to remove unwanted sources of variation from a single-cell
dataset. For example, we could ‘regress out’ heterogeneity associated
with (for example) cell cycle stage, or mitochondrial contamination.
These features are still supported in ScaleData() in
Seurat v3, i.e.:
# pbmc <- ScaleData(pbmc, vars.to.regress = 'percent.mt')
However, particularly for advanced users who would like to use this
functionality, we strongly recommend the use of our new normalization
workflow, SCTransform(). The method is described in our paper,
with a separate vignette using Seurat v3 here. As with
ScaleData(), the function SCTransform() also
includes a vars.to.regress parameter.
Next we perform PCA on the scaled data. By default, only the
previously determined variable features are used as input, but can be
defined using features argument if you wish to choose a
different subset.
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
Seurat provides several useful ways of visualizing both cells and
features that define the PCA, including VizDimReduction(),
DimPlot(), and DimHeatmap()
# Examine and visualize PCA results a few different ways
print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)
## PC_ 1
## Positive: CST3, TYROBP, LST1, AIF1, FTL
## Negative: MALAT1, LTB, IL32, IL7R, CD2
## PC_ 2
## Positive: CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1
## Negative: NKG7, PRF1, CST7, GZMB, GZMA
## PC_ 3
## Positive: HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1
## Negative: PPBP, PF4, SDPR, SPARC, GNG11
## PC_ 4
## Positive: HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1
## Negative: VIM, IL7R, S100A6, IL32, S100A8
## PC_ 5
## Positive: GZMB, NKG7, S100A8, FGFBP2, GNLY
## Negative: LTB, IL7R, CKB, VIM, MS4A7
VizDimLoadings(pbmc, dims = 1:2, reduction = "pca")
DimPlot(pbmc, reduction = "pca")
In particular DimHeatmap() allows for easy exploration
of the primary sources of heterogeneity in a dataset, and can be useful
when trying to decide which PCs to include for further downstream
analyses. Both cells and features are ordered according to their PCA
scores. Setting cells to a number plots the ‘extreme’ cells
on both ends of the spectrum, which dramatically speeds plotting for
large datasets. Though clearly a supervised analysis, we find this to be
a valuable tool for exploring correlated feature sets.
DimHeatmap(pbmc, dims = 1, cells = 500, balanced = TRUE)
DimHeatmap(pbmc, dims = 1:15, cells = 500, balanced = TRUE)
To overcome the extensive technical noise in any single feature for scRNA-seq data, Seurat clusters cells based on their PCA scores, with each PC essentially representing a ‘metafeature’ that combines information across a correlated feature set. The top principal components therefore represent a robust compression of the dataset. However, how many components should we choose to include? 10? 20? 100?
In Macosko et al, we implemented a resampling test inspired by the JackStraw procedure. We randomly permute a subset of the data (1% by default) and rerun PCA, constructing a ‘null distribution’ of feature scores, and repeat this procedure. We identify ‘significant’ PCs as those who have a strong enrichment of low p-value features.
# Note: The Seurat defaults are num.replicates=100, prop.freq=0.01. The parameters we're
# using here are just for speed, and will give conservative p-values.
pbmc <- JackStraw(pbmc, num.replicate = 10, prop.freq = 0.1)
pbmc <- ScoreJackStraw(pbmc, dims = 1:20)
The JackStrawPlot() function provides a visualization
tool for comparing the distribution of p-values for each PC with a
uniform distribution (dashed line). ‘Significant’ PCs will show a strong
enrichment of features with low p-values (solid curve above the dashed
line). In this case it appears that there is a sharp drop-off in
significance after the first 10-12 PCs.
JackStrawPlot(pbmc, dims = 1:15)
# Seurat plots this in a weird way. Let's adjust it a bit.
JackStrawPlot(pbmc, dims = 1:15) + coord_cartesian() + geom_abline(intercept = 0, slope = 0.05)
# For each PC, the furthest point to the right below the solid line gives proportion of
# significant genes with FDR 0.05.
An alternative heuristic method generates an ‘Elbow plot’: a ranking
of principle components based on the percentage of variance explained by
each one (ElbowPlot() function). In this example, we can
observe an ‘elbow’ around PC9-10, suggesting that the majority of true
signal is captured in the first 10 PCs.
ElbowPlot(pbmc)
Identifying the true dimensionality of a dataset – can be challenging/uncertain for the user. We therefore suggest these three approaches to consider. The first is more supervised, exploring PCs to determine relevant sources of heterogeneity, and could be used in conjunction with GSEA for example. The second implements a statistical test based on a random null model, but is time-consuming for large datasets, and may not return a clear PC cutoff. The third is a heuristic that is commonly used, and can be calculated instantly. In this example, all three approaches yielded similar results, but we might have been justified in choosing anything between PC 7-12 as a cutoff.
We chose 10 here, but encourage users to consider the following:
Seurat v3 applies a graph-based clustering approach, building upon initial strategies in (Macosko et al). Importantly, the distance metric which drives the clustering analysis (based on previously identified PCs) remains the same. However, our approach to partitioning the cellular distance matrix into clusters has dramatically improved. Our approach was heavily inspired by recent manuscripts which applied graph-based clustering approaches to scRNA-seq data [SNN-Cliq, Xu and Su, Bioinformatics, 2015] and CyTOF data [PhenoGraph, Levine et al., Cell, 2015]. Briefly, these methods embed cells in a graph structure - for example a K-nearest neighbor (KNN) graph, with edges drawn between cells with similar feature expression patterns, and then attempt to partition this graph into highly interconnected ‘quasi-cliques’ or ‘communities’.
As in PhenoGraph, we first construct a KNN graph based on the
euclidean distance in PCA space, and refine the edge weights between any
two cells based on the shared overlap in their local neighborhoods
(Jaccard similarity). This step is performed using the
FindNeighbors() function, and takes as input the previously
defined dimensionality of the dataset (first 10 PCs).
To cluster the cells, we next apply modularity optimization
techniques such as the Louvain algorithm (default) or SLM [SLM, Blondel
et al., Journal of Statistical Mechanics], to iteratively
group cells together, with the goal of optimizing the standard
modularity function. The FindClusters() function implements
this procedure, and contains a resolution parameter that sets the
‘granularity’ of the downstream clustering, with increased values
leading to a greater number of clusters. We find that setting this
parameter between 0.4-1.2 typically returns good results for single-cell
datasets of around 3K cells. Optimal resolution often increases for
larger datasets. The clusters can be found using the
Idents() function.
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 2638
## Number of edges: 95965
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8723
## Number of communities: 9
## Elapsed time: 0 seconds
# Look at cluster IDs of the first 5 cells
head(Idents(pbmc), 5)
## AAACATACAACCAC-1 AAACATTGAGCTAC-1 AAACATTGATCAGC-1 AAACCGTGCTTCCG-1
## 2 3 2 1
## AAACCGTGTATGCG-1
## 6
## Levels: 0 1 2 3 4 5 6 7 8
Seurat offers several non-linear dimensional reduction techniques, such as tSNE and UMAP, to visualize and explore these datasets. The goal of these algorithms is to learn the underlying manifold of the data in order to place similar cells together in low-dimensional space. Cells within the graph-based clusters determined above should co-localize on these dimension reduction plots. As input to the UMAP and tSNE, we suggest using the same PCs as input to the clustering analysis.
# If you haven't installed UMAP, you can do so via reticulate::py_install(packages =
# 'umap-learn')
pbmc <- RunUMAP(pbmc, dims = 1:10)
# note that you can set `label = TRUE` or use the LabelClusters function to help label
# individual clusters
DimPlot(pbmc, reduction = "umap")
You can save the object at this point so that it can easily be loaded back in without having to rerun the computationally intensive steps performed above, or easily shared with collaborators.
saveRDS(pbmc, file = "pbmc_tutorial.rds")
Run FindNeighbours and FindClusters again,
with a different number of dimensions or with a different resolution.
Examine the resulting clusters using DimPlot.
To maintain the flow of this tutorial, please put the output of this
exploration in a different variable, such as pbmc2!
Seurat can help you find markers that define clusters via
differential expression. By default, it identifies positive and negative
markers of a single cluster (specified in ident.1),
compared to all other cells. FindAllMarkers() automates
this process for all clusters, but you can also test groups of clusters
vs. each other, or against all cells.
The min.pct argument requires a feature to be detected
at a minimum percentage in either of the two groups of cells, and the
thresh.test argument requires a feature to be differentially expressed
(on average) by some amount between the two groups. You can set both of
these to 0, but with a dramatic increase in time - since this will test
a large number of features that are unlikely to be highly
discriminatory. As another option to speed up these computations,
max.cells.per.ident can be set. This will downsample each
identity class to have no more cells than whatever this is set to. While
there is generally going to be a loss in power, the speed increases can
be significant and the most highly differentially expressed features
will likely still rise to the top.
# find all markers of cluster 2
cluster2.markers <- FindMarkers(pbmc, ident.1 = 2, min.pct = 0.25)
head(cluster2.markers, n = 5)
| p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | |
|---|---|---|---|---|---|
| IL32 | 0 | 1.2154360 | 0.949 | 0.466 | 0 |
| LTB | 0 | 1.2828597 | 0.981 | 0.644 | 0 |
| CD3D | 0 | 0.9359210 | 0.922 | 0.433 | 0 |
| IL7R | 0 | 1.1776027 | 0.748 | 0.327 | 0 |
| LDHB | 0 | 0.8837324 | 0.953 | 0.614 | 0 |
# find all markers distinguishing cluster 5 from clusters 0 and 3
cluster5.markers <- FindMarkers(pbmc, ident.1 = 5, ident.2 = c(0, 3), min.pct = 0.25)
head(cluster5.markers, n = 5)
| p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | |
|---|---|---|---|---|---|
| FCGR3A | 0 | 4.267579 | 0.975 | 0.039 | 0 |
| IFITM3 | 0 | 3.877105 | 0.975 | 0.048 | 0 |
| CFD | 0 | 3.411039 | 0.938 | 0.037 | 0 |
| CD68 | 0 | 3.014535 | 0.926 | 0.035 | 0 |
| RP11-290F20.3 | 0 | 2.722684 | 0.840 | 0.016 | 0 |
# find markers for every cluster compared to all remaining cells, report only the positive
# ones
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
pbmc.markers %>%
group_by(cluster) %>%
slice_max(n = 2, order_by = avg_log2FC)
| p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | cluster | gene |
|---|---|---|---|---|---|---|
| 0 | 1.333503 | 0.435 | 0.108 | 0 | 0 | CCR7 |
| 0 | 1.069166 | 0.897 | 0.593 | 0 | 0 | LDHB |
| 0 | 5.570063 | 0.996 | 0.215 | 0 | 1 | S100A9 |
| 0 | 5.477394 | 0.975 | 0.121 | 0 | 1 | S100A8 |
| 0 | 1.282860 | 0.981 | 0.644 | 0 | 2 | LTB |
| 0 | 1.240361 | 0.424 | 0.111 | 0 | 2 | AQP3 |
| 0 | 4.310172 | 0.936 | 0.041 | 0 | 3 | CD79A |
| 0 | 3.591579 | 0.622 | 0.022 | 0 | 3 | TCL1A |
| 0 | 3.006740 | 0.595 | 0.056 | 0 | 4 | GZMK |
| 0 | 2.966206 | 0.957 | 0.241 | 0 | 4 | CCL5 |
| 0 | 3.311697 | 0.975 | 0.134 | 0 | 5 | FCGR3A |
| 0 | 3.085654 | 1.000 | 0.315 | 0 | 5 | LST1 |
| 0 | 4.917370 | 0.958 | 0.135 | 0 | 6 | GNLY |
| 0 | 4.888172 | 0.986 | 0.071 | 0 | 6 | GZMB |
| 0 | 3.871151 | 0.812 | 0.011 | 0 | 7 | FCER1A |
| 0 | 2.874465 | 1.000 | 0.513 | 0 | 7 | HLA-DPB1 |
| 0 | 8.575862 | 1.000 | 0.024 | 0 | 8 | PPBP |
| 0 | 7.243377 | 1.000 | 0.010 | 0 | 8 | PF4 |
Seurat has several tests for differential expression which can be set with the test.use parameter (see our DE vignette for details). For example, the ROC test returns the ‘classification power’ for any individual marker (ranging from 0 - random, to 1 - perfect).
cluster0.markers <- FindMarkers(pbmc, ident.1 = 0, logfc.threshold = 0.25, test.use = "roc", only.pos = TRUE)
We include several tools for visualizing marker expression.
VlnPlot() (shows expression probability distributions
across clusters), and FeaturePlot() (visualizes feature
expression on a tSNE or PCA plot) are our most commonly used
visualizations. We also suggest exploring RidgePlot(),
CellScatter(), and DotPlot() as additional
methods to view your dataset.
VlnPlot(pbmc, features = c("MS4A1", "CD79A"))
# you can plot raw counts as well
VlnPlot(pbmc, features = c("NKG7", "PF4"), slot = "counts", log = TRUE)
FeaturePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP",
"CD8A"))
These are ridgeplots, cell scatter plots and dotplots. Replace
FeaturePlot with the other functions.
RidgePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP",
"CD8A"))
For CellScatter plots, will need the cell id of the cells you want to
look at. You could access this using the
[[ notation.
head(pbmc[[]])
| orig.ident | nCount_RNA | nFeature_RNA | percent.mt | percent.riboL | percent.riboS | lowRiboL | RNA_snn_res.0.5 | seurat_clusters | |
|---|---|---|---|---|---|---|---|---|---|
| AAACATACAACCAC-1 | pbmc3k | 2419 | 779 | 3.0177759 | 24.76230 | 18.933444 | FALSE | 2 | 2 |
| AAACATTGAGCTAC-1 | pbmc3k | 4903 | 1352 | 3.7935958 | 23.98532 | 18.417296 | FALSE | 3 | 3 |
| AAACATTGATCAGC-1 | pbmc3k | 3147 | 1129 | 0.8897363 | 18.04894 | 13.632031 | FALSE | 2 | 2 |
| AAACCGTGCTTCCG-1 | pbmc3k | 2639 | 960 | 1.7430845 | 15.08147 | 9.170140 | FALSE | 1 | 1 |
| AAACCGTGTATGCG-1 | pbmc3k | 980 | 521 | 1.2244898 | 10.20408 | 4.693878 | FALSE | 6 | 6 |
| AAACGCACTGGTAC-1 | pbmc3k | 2163 | 781 | 1.6643551 | 22.28386 | 13.915858 | FALSE | 2 | 2 |
CellScatter(pbmc, cell1 = "AAACATACAACCAC-1", cell2 = "AAACATTGAGCTAC-1")
DotPlots
DotPlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A"))
Which plots do you prefer? Discuss.
DoHeatmap() generates an expression heatmap for given
cells and features. In this case, we are plotting the top 20 markers (or
all markers if less than 20) for each cluster.
pbmc.markers %>%
group_by(cluster) %>%
top_n(n = 10, wt = avg_log2FC) -> top10
DoHeatmap(pbmc, features = top10$gene) + NoLegend()
Fortunately in the case of this dataset, we can use canonical markers to easily match the unbiased clustering to known cell types:
| Cluster ID | Markers | Cell Type |
|---|---|---|
| 0 | IL7R, CCR7 | Naive CD4+ T |
| 1 | CD14, LYZ | CD14+ Mono |
| 2 | IL7R, S100A4 | Memory CD4+ |
| 3 | MS4A1 | B |
| 4 | CD8A | CD8+ T |
| 5 | FCGR3A, MS4A7 | FCGR3A+ Mono |
| 6 | GNLY, NKG7 | NK |
| 7 | FCER1A, CST3 | DC |
| 8 | PPBP | Platelet |
new.cluster.ids <- c("Naive CD4 T", "CD14+ Mono", "Memory CD4 T", "B", "CD8 T", "FCGR3A+ Mono",
"NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()
saveRDS(pbmc, file = "pbmc3k_final.rds")
# install.packages('BiocManager')
# BiocManager::install(c('SingleCellExperiment','SingleR','celldex'),ask=F)
library(SingleCellExperiment)
library(SingleR)
library(celldex)
In this workshop we have focused on the Seurat package. However, there is another whole ecosystem of R packages for single cell analysis within Bioconductor. We won’t go into any detail on these packages in this workshop, but there is good material describing the object type online : OSCA.
For now, we’ll just convert our Seurat object into an object called SingleCellExperiment. Some popular packages from Bioconductor that work with this type are Slingshot, Scran, Scater.
sce <- as.SingleCellExperiment(pbmc)
We will now use a package called SingleR to label each cell. SingleR
uses a reference data set of cell types with expression data to infer
the best label for each cell. A convenient collection of cell type
reference is in the celldex package which currently
contains the follow sets:
ls("package:celldex")
## [1] "BlueprintEncodeData" "DatabaseImmuneCellExpressionData"
## [3] "HumanPrimaryCellAtlasData" "ImmGenData"
## [5] "MonacoImmuneData" "MouseRNAseqData"
## [7] "NovershternHematopoieticData"
In this example, we’ll use the HumanPrimaryCellAtlasData
set, which contains high-level, and fine-grained label types.
ref.set <- celldex::HumanPrimaryCellAtlasData()
head(unique(ref.set$label.main))
## [1] "DC" "Smooth_muscle_cells" "Epithelial_cells"
## [4] "B_cell" "Neutrophils" "T_cells"
An example of the types of “fine” labels.
head(unique(ref.set$label.fine))
## [1] "DC:monocyte-derived:immature" "DC:monocyte-derived:Galectin-1"
## [3] "DC:monocyte-derived:LPS" "DC:monocyte-derived"
## [5] "Smooth_muscle_cells:bronchial:vit_D" "Smooth_muscle_cells:bronchial"
Now we’ll label our cells using the SingleCellExperiment object, with the above reference set.
pred.cnts <- SingleR::SingleR(test = sce, ref = ref.set, labels = ref.set$label.main)
Keep any types that have more than 10 cells to the label, and put those labels back on our Seurat object and plot our on our umap.
lbls.keep <- table(pred.cnts$first.labels) > 10
pbmc[["SingleR.labels"]] <- ifelse(lbls.keep[pred.cnts$first.labels], pred.cnts$first.labels, "Other")
DimPlot(pbmc, reduction = "umap", group.by = "SingleR.labels")
It is nice to see that SingleR does not use the clusters we computed earlier, but the labels do seem to match those clusters reasonably well.
For this workshop, we’ll use the PBMC data object with Monocle for pseudotime trajectory analysis. It’s debatable whether this is a suitable dataset but will suit our needs for demonstration purposes.
This content is based off the Calculating Trajectories with Monocle 3 and Seurat material as well the Monocle3 documentation, combining it with the PBMC dataset from the original Seurat vignette. We recommend reading the Monocle3 documentation for greater understanding of the Monocle package.
Firstly, load Monocle:
library(monocle3)
library(SeuratWrappers)
As the PBMC data has been processed, we can proceed with converting
the pbmc Seurat object to a cell_data_set
object, which is a class from the Monocle package. The
as.cell_data_set function is used from the
SeuratWrappers library and is used to convert the Seurat
object into a cell_data_set object.
While we have performed the general analysis steps of quality control, scaling and normalization, dimensionality reduction and clustering with Seurat, Monocle is also capable of performing these steps with its own in-built functions. It is often a matter of preference which package to use, depending on what downstream tasks the analyst would like to perform.
We aren’t going to delve deeply into the properties of the
cell_data_set object. Just be aware that this is a
different way to represent the count assay data and dimensionality
reduction data. The functions from the Monocle package
expects the scRNA data to be this class and therefore, the Seurat object
needs to be converted to this class. It also means that the Seurat
functions that we’ve been using will not work with the
cell_data_set object.
cds <- as.cell_data_set(pbmc)
While we have previously clustered the pbmc dataset using Seurat, Monocle will also calculate ‘partitions’ - these are superclusters of the Louvain/Leiden communties that are found using a kNN pruning method. The warning message during the conversion notes that Seurat doesn’t calculate partitions and clusters need to be re-calculated using Monocle.
Examine the cds object:
## Inspect the cds object and compare it to the Seurat pbmc object
cds
Now re-cluster the cds object:
cds <- cluster_cells(cds)
p1 <- plot_cells(cds, show_trajectory_graph = FALSE)
p2 <- plot_cells(cds, color_cells_by = "partition", show_trajectory_graph = FALSE)
wrap_plots(p1, p2)
We can see that in the first plot, Monocle has identified 3 clusters but all the clusters fall within the same partition. Ideally, partitions should correspond to clusters of cells within the same path of differentiation or cell within the same trajectory.
Source: Haematopoiesis and red blood cells
We can see from this figure of haematopoiesis that our PBMC sample contains a mix of cells from different cell types and are unlikely to be suitable for calculating a pseudotime trajectory. Nonetheless, we’ll demonstrate the steps involved.
Next, we need to run learn_graph to learn the trajectory
graph. This function aims to learn how cells transition through a
biological program of gene expression changes in an experiment.
cds <- learn_graph(cds)
plot_cells(cds, label_groups_by_cluster = FALSE, label_leaves = FALSE, label_branch_points = FALSE)
As expected from the partition plot, Monocle thinks all the cells are from the same partition and therefore has plotted a trajectory line that connects all clusters.
Can we fix this?
Monocle currently thinks that all cells belong to the same partition. We might be able to tweak the clustering for a better result. One thing we can think about is that we have a different number of clusters generated by Monocle (3) when Seurat gave us 9.
If we examine the default parameters by ?FindClusters
and ?cluster_cells, we might notice that Seurat’s default
clustering algorithm is louvain while Monocle’s is
leiden. We aren’t going to delve into the details of these
algorithms, but we will say, just be aware of the default behavior of
your analysis tools and that the choice in algorithm will affect the
results of the clustering.
We can change algorithm with
cds <- cluster_cells(cds, cluster_method = "louvain")
but in this case, we might just try altering the resolution with the
default leiden algorithm to increase the number of clusters
yielded. Changing the k argument will change the number of
nearest neighbors used when creating the k nearest neighbor graph. A
large k value (the default is 20) reduces the number of
clusters (therefore the bigger k is, the less clusters will be
generated) and vice versa (smaller k value - more
clusters).
cds <- cluster_cells(cds, k = 5, random_seed = 5)
p1 <- plot_cells(cds, show_trajectory_graph = FALSE)
p2 <- plot_cells(cds, color_cells_by = "partition", show_trajectory_graph = FALSE)
wrap_plots(p1, p2)
The UMAP on the left looks under clustered compared to our original
clustering with Seurat. We’d probably need to tweak more parameters to
get Monocle to match the Seurat clustering. We don’t necessarily need to
do that because our cds object actually still has the
meta-data about the Seurat clusters stored in it (examine this with
head(colData(cds)). However, importantly, our partition
plot looks a little more sensible and no longer has lumped all cells
into one supercluster.
Let’s re-run the learn_graph step:
cds <- learn_graph(cds)
plot_cells(cds, color_cells_by = "partition", label_groups_by_cluster = FALSE, label_leaves = FALSE,
label_branch_points = FALSE)
Monocle now ‘correctly’ builds trajectories that recognizes distinct cell lineages.
We might choose to remove the B-cells and monocytes and focus just on the cluster of CD4T/CD8T cells, as this is the largest group of cells.
# Create a vector of idents to keep
selected_ids <- c("Naive CD4 T", "Memory CD4 T", "CD8 T")
tcells_pbmc <- subset(pbmc, idents = selected_ids) ## subset the PBMC seurat object to tcells
cds <- as.cell_data_set(tcells_pbmc) ## convert this to cell_data_set
cds <- cluster_cells(cds)
cds <- learn_graph(cds)
plot_cells(cds, label_groups_by_cluster = FALSE, label_leaves = FALSE, label_branch_points = FALSE)
The next step is to order cells in pseudotime:
Pseudotime is a measure of how much progress an individual cell has made through a process such as cell differentiation.
In many biological processes, cells do not progress in perfect synchrony. In single-cell expression studies of processes such as cell differentiation, captured cells might be widely distributed in terms of progress. That is, in a population of cells captured at exactly the same time, some cells might be far along, while others might not yet even have begun the process. This asynchrony creates major problems when you want to understand the sequence of regulatory changes that occur as cells transition from one state to the next. Tracking the expression across cells captured at the same time produces a very compressed sense of a gene’s kinetics, and the apparent variability of that gene’s expression will be very high.
By ordering each cell according to its progress along a learned trajectory, Monocle alleviates the problems that arise due to asynchrony. Instead of tracking changes in expression as a function of time, Monocle tracks changes as a function of progress along the trajectory, which we term “pseudotime”. Pseudotime is an abstract unit of progress: it’s simply the distance between a cell and the start of the trajectory, measured along the shortest path. The trajectory’s total length is defined in terms of the total amount of transcriptional change that a cell undergoes as it moves from the starting state to the end state.
Source: Monocle’s documentation
Monocle needs to be told where the ‘beginning’ of the biological
process is. There are a variety of ways that this can be determined -
the Monocle documentation has a custom function to find the root of the
trajectory based on a subset of cells. If the order_cells
function is used without providing which cells to use, it will launch an
interface in which we can directly select cells we think are at the
beginning of the trajectory.
# a helper function to identify the root principal points:
get_earliest_principal_node <- function(cds, cell_type = "Naive CD4 T") {
cell_ids <- which(colData(cds)[, "ident"] == cell_type)
closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
root_pr_nodes <- igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names(which.max(table(closest_vertex[cell_ids,
]))))]
root_pr_nodes
}
cds <- order_cells(cds, root_pr_nodes = get_earliest_principal_node(cds))
We can now plot the trajectory and color cells by pseudotime:
plot_cells(cds, color_cells_by = "pseudotime", label_cell_groups = FALSE, label_leaves = FALSE,
label_branch_points = FALSE)
plot_cells(cds, color_cells_by = "ident", label_cell_groups = FALSE, label_leaves = FALSE, label_branch_points = FALSE)
Discuss the results of this pseudotime trajectory (remembering this is a bogus example):
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.2.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] celldex_1.4.0 SingleR_1.8.1
## [3] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
## [5] Biobase_2.54.0 GenomicRanges_1.46.1
## [7] GenomeInfoDb_1.30.1 IRanges_2.28.0
## [9] S4Vectors_0.32.3 BiocGenerics_0.40.0
## [11] MatrixGenerics_1.6.0 matrixStats_0.61.0
## [13] patchwork_1.1.1 SeuratObject_4.0.4
## [15] Seurat_4.1.0 ggplot2_3.3.5
## [17] dplyr_1.0.8 stringr_1.4.0
##
## loaded via a namespace (and not attached):
## [1] AnnotationHub_3.2.1 BiocFileCache_2.2.1
## [3] plyr_1.8.6 igraph_1.2.11
## [5] lazyeval_0.2.2 splines_4.1.1
## [7] BiocParallel_1.28.3 listenv_0.8.0
## [9] scattermore_0.8 digest_0.6.29
## [11] htmltools_0.5.2 fansi_1.0.2
## [13] memoise_2.0.1 magrittr_2.0.2
## [15] ScaledMatrix_1.2.0 tensor_1.5
## [17] cluster_2.1.2 ROCR_1.0-11
## [19] Biostrings_2.62.0 globals_0.14.0
## [21] spatstat.sparse_2.1-0 colorspace_2.0-3
## [23] rappdirs_0.3.3 blob_1.2.2
## [25] ggrepel_0.9.1 xfun_0.29
## [27] crayon_1.5.0 RCurl_1.98-1.6
## [29] jsonlite_1.8.0 spatstat.data_2.1-2
## [31] survival_3.2-13 zoo_1.8-9
## [33] glue_1.6.1 polyclip_1.10-0
## [35] gtable_0.3.0 zlibbioc_1.40.0
## [37] XVector_0.34.0 leiden_0.3.9
## [39] DelayedArray_0.20.0 BiocSingular_1.10.0
## [41] future.apply_1.8.1 abind_1.4-5
## [43] scales_1.1.1 DBI_1.1.2
## [45] spatstat.random_2.1-0 miniUI_0.1.1.1
## [47] Rcpp_1.0.8 viridisLite_0.4.0
## [49] xtable_1.8-4 reticulate_1.24
## [51] spatstat.core_2.4-0 bit_4.0.4
## [53] rsvd_1.0.5 htmlwidgets_1.5.4
## [55] httr_1.4.2 RColorBrewer_1.1-2
## [57] ellipsis_0.3.2 ica_1.0-2
## [59] pkgconfig_2.0.3 farver_2.1.0
## [61] dbplyr_2.1.1 sass_0.4.0
## [63] uwot_0.1.11 deldir_1.0-6
## [65] utf8_1.2.2 AnnotationDbi_1.56.2
## [67] tidyselect_1.1.2 labeling_0.4.2
## [69] rlang_1.0.1 reshape2_1.4.4
## [71] later_1.3.0 BiocVersion_3.14.0
## [73] cachem_1.0.6 munsell_0.5.0
## [75] tools_4.1.1 cli_3.2.0
## [77] ExperimentHub_2.2.1 RSQLite_2.2.10
## [79] generics_0.1.2 ggridges_0.5.3
## [81] evaluate_0.15 fastmap_1.1.0
## [83] yaml_2.3.5 goftest_1.2-3
## [85] bit64_4.0.5 knitr_1.37
## [87] fitdistrplus_1.1-6 purrr_0.3.4
## [89] RANN_2.6.1 KEGGREST_1.34.0
## [91] sparseMatrixStats_1.6.0 pbapply_1.5-0
## [93] future_1.24.0 nlme_3.1-155
## [95] mime_0.12 formatR_1.11
## [97] compiler_4.1.1 interactiveDisplayBase_1.32.0
## [99] filelock_1.0.2 curl_4.3.2
## [101] plotly_4.10.0 png_0.1-7
## [103] spatstat.utils_2.3-0 tibble_3.1.6
## [105] bslib_0.3.1 stringi_1.7.6
## [107] highr_0.9 RSpectra_0.16-0
## [109] lattice_0.20-45 Matrix_1.4-0
## [111] vctrs_0.3.8 pillar_1.7.0
## [113] lifecycle_1.0.1 BiocManager_1.30.16
## [115] spatstat.geom_2.3-2 lmtest_0.9-39
## [117] jquerylib_0.1.4 BiocNeighbors_1.12.0
## [119] RcppAnnoy_0.0.19 data.table_1.14.2
## [121] cowplot_1.1.1 bitops_1.0-7
## [123] irlba_2.3.5 httpuv_1.6.5
## [125] R6_2.5.1 promises_1.2.0.1
## [127] KernSmooth_2.23-20 gridExtra_2.3
## [129] parallelly_1.30.0 codetools_0.2-18
## [131] MASS_7.3-55 assertthat_0.2.1
## [133] withr_2.4.3 sctransform_0.3.3
## [135] GenomeInfoDbData_1.2.7 mgcv_1.8-38
## [137] parallel_4.1.1 beachmat_2.10.0
## [139] grid_4.1.1 rpart_4.1.16
## [141] tidyr_1.2.0 DelayedMatrixStats_1.16.0
## [143] rmarkdown_2.11 Rtsne_0.15
## [145] shiny_1.7.1